Image processing apparatus including an image data encoder having at least two scalability modes and method therefor

ABSTRACT

An image processing apparatus and method therefor for presenting an image corresponding to the capability of equipment to which an image is supplied, and the needs of users. The apparatus present the image by inputting external information represents desired scalability from external equipment, encoding the image data with the desired scalability according to the external information, and outputting the encoded data to external equipment.

This application is a division of application Ser. No. 09/389,449, filedSep. 3, 1999, now U.S. Pat. No. 6,603,883.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an image processing apparatus andmethod therefor. More specifically, the present invention related to animage processing apparatus for encoding and decoding image data and to amethod of encoding and decoding the same.

2. Related Background Art

JPEG (Joint Photographic Coding Experts Group), H.261, and its improvedMPEG (Moving Picture Experts Group) exist as international standards forthe encoding of sound and image data. To handle integrated sounds andimages in the current multi-media age, MPEG has been improved to MPEG1,and MPEG1 has undergone further improvement to MPEG2, both of which arecurrently in widespread use.

MPEG2 is the standard for moving picture encoding which is developed torespond to the demands for high image quality. Specifically:

(1) it can be used for applications ranging from communications tobroadcasting, in addition to stored media data,

(2) it can be used for images with much higher quality than standardtelevision, with possibility of extension in High Definition Television(HDTV),

(3) unlike MPEG1 and H.261, which can only be used with non-interlacedimage date, MPEG2 can be used to encode interlaced images,

(4) it possesses scalability, and

(5) an MPEG2 decoder is able to process an MPEG1 bit stream; in otherwords, it is downwardly compatible.

Of the five characteristics listed, especially, item (4), scalability,is new to MPEG2, and roughly classified into three types, spatialscalability, temporal scalability, and signal to noise ratio (SNR)scalability, which are outlined below.

Spatial Scalability

FIG. 1 shows an outline of spatial scalability encoding. The base layerhas a small temporal resolution, while the enhancement layer has a largetemporal resolution.

The base layer consists of spatial sub-sampling of the original image ata fixed ratio, lowering the spatial resolution (image quality), andreducing the encoding volume per frame. In other words, it is a layerwith a lower spatial resolution image quality and less code amount.Encoding takes place by using interframe prediction encoding within thebase layer. This means that the image can be decoded from only the baselayer.

On the other hand, the enhancement layer has a high image quality forspatial resolution and large code amount. The base layer image data isup-sampled (averaging, for example, is used to add a pixel betweenpixels in the low resolution image, creating a high resolution image) togenerate an expanded base layer with the same size as the enhancementlayer. Encoding takes place using not only predictions from an imagewithin the enhancement layer, but also predictions taken from theup-sampled expanded image. Therefore it is not possible to decode theimage from only the enhancement layer.

By decoding image data of the enhancement layer, encoded as describedabove, an image with the same spatial size as the original image isobtained, the image quality depending upon the rate of compression.

The use of spatial scalability allows two image sequences to beefficiently encoded, as compared to encoding and sending each imageseparately.

Temporal Scalability

FIG. 2 shows an outline of temporal scalability encoding. The base layerhas a small temporal resolution, while the enhancement layer has a largetemporal resolution.

The base layer has a temporal resolution (frame rate) that has beenprovided by thinning out the original image on a frame basis at aconstant rate, thereby lowering the temporal resolution and reducing theamount of encoded data to be transmitted. In other words, it is a layerwith a lower image quality for temporal resolution and less code amount.Encoding takes place using inter-frame prediction encoding within thebase layer. This means that the image can be decoded from only the baselayer.

On the other hand, the enhancement layer has a high image quality fortemporal resolution and large code amount. Encoding takes place usingprediction from not only I, P, B pictures within the enhancement layer,but also the base layer image data. Therefore it is not possible todecode the image from only the enhancement layer.

By decoding image data of the enhancement layer, encoded as describedabove, an image with the same frame rate as the original image isobtained, the image quality depending upon the rate of compression.

Temporal scaling allows, for example, a 30 Hz non-interlaced image and a60 Hz non-interlaced image to be sent efficiently at the same time.

Temporal scalability is currently not in use. It is part of a futureexpansion of MPEG2 (treated as “reserved”).

SNR Scalability

FIG. 3 shows an outline of SNR scalability encoding.

The layer having a low image quality is referred to as a base layer,whereas the layer having a high image quality is referred to as anenhancement layer.

The base layer is provided, in the process of encoding (compressing) theoriginal data, for example, in dividing it into blocks, DC-ACconverting, quantizing and variable length encoding, by compressing theoriginal image at relatively high compression rate (rough quantum stepsize) to result in less code amount. That is, the base layer is a layerwith a low image quality, in terms of (N/S) image quality, and less codeamount. In this base layer, encoding is carried out using MPEG1 or MPEG2(with predictive encoding) decided to each frame.

On the other hand, the enhancement layer has a higher quality largercode amount than the base layer. The enhancement layer is provided bydecoding an encoded image in the base layer, subtracting the decodedimage from the original image, and intraframe encoding only thesubtraction result at a relatively low compression rate (with aquantizing step size smaller than in the base layer). All encoding inSNR scaling takes place within the frame (field). No inter-frame(inter-field) prediction encoding is used. The entire encoding sequenceis performed intra-frame (intra-field).

Using SNR scalability allows two types of images with differing picturequality to be encoded or decoded efficiently at the same time.

However, previous designs of encoding devices is not provided an optionto freely select the size of the base layer image in spatialscalability. The image size of the base layer is a function ofrelationship between the enhancement layer and the base layer, and henceis not allowed to vary.

In addition, SNR scalability devices have faced similar limitations. Thebase layer frame rate is determined uniquely as a function of theenhancement layer, and the size of the base layer image could not befreely selected.

Therefore, previous encoding devices have not allowed one to select codeamount, such as an image size and a frame rate, when using thescalability function. One could not select any factor directly relatedto the condition of the decoding device or the lines on the output side.

In other words, when an encoded image data is output from an encodingdevice employing spatial scalability or SNR scalability to a decodingdevice (receiving side), image quality choices are limited to:

1) a low quality image decoded from the base layer only, or

2) a high quality image provided by decoding both the base layer and theenhancement layer.

Accordingly, there is no opportunity to select image quality (decodingspeed) in accordance with the capabilities of the decoding device or theneeds of an individual user, which is a problem not addressedpreviously.

In addition, recent advances have taken place in the imaging fieldrelated to object encoding. MPEG4, currently being advanced as theimaging technology standard, is a good example. MPEG4 splits up oneimage into a background and several objects which exist in front of thatbackground, and then encodes each of the different parts independently.Object encoding enjoys many benefits.

If the background is a relatively static environment and only some ofthe objects in the foreground are undergoing motion, then the backgroundand all objects that do not move do not need to be re-encoded. Only theobject which is moving is re-encoded. The amount of codes generated byre-encoding, that is, the amount of codes generated in encoding of thenext image frame, is greatly reduced, and transmission of a very highquality image at a low transfer rate can be attained.

In addition, computer graphics (CG) can be used to provide an objectimage. In this case, the encoder only needs to encode the CG mesh(position and shape change) data, further contributing to the slimmingdown of the transfer code amount.

On the decoder side, the mesh data can be used to construct the imagethrough computation to incorporate the constructed image into a picture.Using face animation as an example of CG, the eyes, nose, and otherobject data and their shape change information, received from theencoder, can be used by the decoder to perform operation on thecharacteristic data in the decoder, and then the updating operation toinclude the new data into the image can be carried out, thereby formingthe animation.

Until now, when decoding an encoded image data at an image displayterminal, the hierarchical degree at which a decoding process would takeplace has been fixed. For that reason, there has no selectability orpossibility to change the hierarchy of the object to be displayed.Accordingly, this has not led to a high performance processing thatmeets with the processing capabilities of the terminal. Optimal decodingthat makes use of the capabilities of the decoder, in relation to theencoded image data changing with time from the encoder, has not beenpossible.

In addition, encoding and decoding of CG data has been generallyconsidered a process that is best handled in software, not hardware, andthere are many examples of such software processes. Therefore, if thenumber of objects within one frame of an image increases, the hardwareload on the decoder rapidly increases, but if the objects are faceanimation or similar CG data, then the software load (operation volume,operation time) will grow large.

A face object visual standard is defined for the encoding of face imagesin CG. In MPEG4, a face definition parameter (FDP), defining shape andtexture of the facial image, and a face animation parameter (FAP), usedto express the motions of the face, eyebrows, eyelids, eyes, nose, lips,teeth, tongue, cheeks, chin, etc., are used as standards.

A face animation is made by processing the FDP and FAP data andcombining the results, thereby creating a larger load for the decoderthan decoding by using encoded natural image data. The performance ofthe decoder may lead to obstacles such as the inability to decode, whichcan in turn lead to image quality problems such as object freeze andincompleteness.

SUMMARY OF THE INVENTION

In view of the background described above, an object of the presentinvention is to provide an image processing apparatus, and a method usedtherein, through which image data that satisfies the needs of users andresponds to the performance characteristics of external equipmentreceiving the image data, may be obtained.

According to a preferred embodiment of the present invention, there isprovided an image processing apparatus and method therefor whereinexternal information representing a desired scalability is input fromexternal equipment, then image data is encoded at the desiredscalability according to the external information, and the encoded datais output to the external equipment.

According to another preferred embodiment of the present invention,there is provided an image processing apparatus and method thereforwherein image data encoded at a predetermined scalability by externalequipment is input, the encoded image data is decoded, and in order tomake the external equipment encode image data at the desiredscalability, information representing the desired scalability is outputto the external equipment.

According to another preferred embodiment of the present invention,there is provided an image processing apparatus and method therefor, forreceiving encoded image data and decoding the encoded image data,wherein a decoding process is controlled according to the encoded imagedata and decoding processing capabilities.

Other objects, features, and advantages of the invention will becomeapparent from the following detailed descriptions taken in conjunctionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a view for illustrating spatial scalability;

FIG. 2 is a view for illustrating temporal scalability;

FIG. 3 is a view for illustrating SNR scalability;

FIG. 4 is a block diagram showing the structure of an encoding device ina first embodiment of the present invention;

FIG. 5 is a block diagram showing the internal structure of a controlcircuit 103;

FIG. 6 is a block diagram showing the internal structure of a first datageneration circuit 105;

FIG. 7 is a block diagram showing the internal structure of a seconddata generation circuit 106;

FIG. 8 is a block diagram showing a decoding device in the firstembodiment of the present invention;

FIG. 9 is a block diagram showing the internal structure of a controlcircuit 208;

FIG. 10 is a block diagram showing the internal structure of a firstdata decoding circuit 209;

FIG. 11 is a block diagram showing the internal structure of a seconddata decoding circuit 210;

FIG. 12 shows an image processing system that contains the functions ofthe decoding device of FIG. 8;

FIG. 13 is a view showing the selecting operation from a genre titlemenu;

FIG. 14 shows the condition setting screen that results from titleselection in FIG. 13;

FIG. 15 shows the operation of setting further desired conditions afterthe condition setting screen shown in FIG. 14;

FIG. 16 is a block diagram showing the structure of a decoder in asecond embodiment according to the present invention;

FIG. 17 is a block diagram showing the structure of a decoding system inthe second embodiment according to the present invention;

FIG. 18 is a block diagram showing the structure of an encoding systemin the second embodiment according to the present invention;

FIGS. 19A and 19B show an example of image data sub-sampling accordingto the present invention;

FIG. 20 shows a flowchart of the processing that takes place in thedecoder of the second embodiment of the present invention;

FIG. 21 is a block diagram showing the structure of a decoder in a thirdembodiment according to the present invention; and

FIG. 22 is a flowchart showing the processing that takes place in thedecoder of the third embodiment according to the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

A first embodiment of the present invention is an encoding device 100shown in FIG. 4.

The encoding device 100 is comprised of a conversion circuit 101 that issupplied with R.G.B. data each having 8 bits, and a first frame memory102 that is supplied with the output of the conversion circuit 101. Inaddition, it is comprised of a first data generation circuit 105 and afirst block forming processing circuit 107 which are both supplied bythe output of the first frame memory 102, and a first encoding circuit109 that is supplied with the output of the first block formingprocessing circuit 107. The first block forming processing circuit 107is also supplied with the output of the first data generation circuit105.

In addition, the encoding device 100 is further comprised of a secondframe memory 104 that is supplied with the output of the conversioncircuit 101, and a second data generation circuit 106 and a second blockforming processing circuit 108, which are both supplied by the output ofthe second frame memory 104. It is also comprised of a second encodingcircuit 110 that is supplied with the output of the second block formingprocessing circuit 108. The second block forming processing circuit 108is also supplied with the output of the second data generation circuit106.

The output from the first data generation circuit 105 is also providedto the second data generation circuit 106, while the output from thefirst encoding circuit 109 is similarly provided to the second encodingcircuit 110.

The encoding device 100 is further comprised of a bit stream generationcircuit 111, which is supplied with the outputs of the first encodingcircuit 109 and the second encoding circuit 110, a recording circuit 114that records the data output by the bit stream generation circuit 111onto a storage medium (for example, a hard disk, video tape, etc.), anda control circuit 103, which controls the entire apparatus.

The internal structure of the control circuit 103 is shown in FIG. 5,and consists of a CPU 701, a program memory 702 that stores processprograms necessary to control the entire apparatus and readable by CPU701, and an information detection circuit 703 that is supplied with anexternal information 112 described in detail later. The externalinformation 112 (external information), which consists of infrastructureinformation, user requests and other data from outside the encodingdevice, is also supplied to the CPU 701.

Accordingly, the CPU 701 reads out the programs that control processes,from program memory 702 and executes the read-out program, therebyrealizing the operation of the encoding device 100.

FIG. 6 shows the internal structure of first data generation circuit105. The first data generation circuit 105 comprises a first selector301 supplied with the output of the first frame memory 102 (YCbCr data),and a second selector 303 and a sampling circuit 304, both supplied withthe output of the first selector 301. In addition, the first datageneration circuit 105 further includes a frame rate controller 305supplied with the output of the second selector 303, and a thirdselector 302 which is supplied with both outputs of the frame ratecontroller 305 and the sampling circuit 304. In addition, the thirdselector 302's output is provided to the first block forming processingcircuit 107 and the second data generation circuit 106.

As shown in FIG. 7, the second data generation circuit 106 has aninternal structure consisting of a first selector 401 supplied with theoutput from the second frame memory 104 (YCbCr data), and a frame memory405 supplied with the output from the first data generation circuit 105(base layer image data). In addition, the second data generation circuit106 further includes a first difference data generation circuit 403 anda second difference data generation circuit 404, both supplied with theoutputs from the first selector 401 and the frame memory 405, and asecond selector 402 supplied with the outputs of the first differencedata generation circuit 403 and the second difference data generationcircuit 404.

Additionally, the output from the second selector 402 is supplied to thesecond block forming processing circuit 108.

In the encoding device 100 as described above, the input image data (8bit RGB data) is first converted to 4:2:0 YCbCr data (each having 8bits) by the conversion circuit 101, and this converted data is sent tothe first frame memory 102 and the second frame memory 104.

Each of the first frame memory 102 and second frame memory 104 storesthe converted YCbCr data output by the conversion circuit 101, and theoperation control of the storing is performed by the control circuit103, which operates as follows.

That is, the information detection circuit 703 inside the controlcircuit 103 (refer to FIG. 5) interprets the external information 112,and provides control information corresponding thereto to the CPU 701.

The CPU 701 then uses the control information provided by theinformation detection circuit 703 to obtain information such as modeinformation regarding use and non-use of the scalability function inencoding, information as to type of scalability function to be used, andvarious control information related to the base layer and theenhancement layer (for example, base layer image size, frame rate,compression ratio, etc.). All of the obtained information (referred toas an encoding control signal hereinafter) is sent to both the firstdata generation circuit 105 and the second data generation circuit 106from the CPU 701.

Simultaneously, the CPU 701 provides the first frame memory 102 and thesecond frame memory 104 with Read/Write (R/W) control signals. Thisallows reading and writing operations in the first frame memory 102 andthe second frame memory 104 to link with the functions of both firstdata generation circuit 105 and second data generation circuit 106.

Therefore the first frame memory 102 and the second frame memory 104operate according to R/W control signals based upon the externalinformation 112. The first data generation circuit 105 and the seconddata generation circuit 106 operate similarly, using an encoding controlsignal also based upon the external information 112.

An explanation of the operation of downstream circuits from theconversion circuit 101 is detailed below, based upon what is determinedby the external information 112, especially, the operational mode. Inthe explanation, the operation of each circuit is described in relationto each of: spatial scalability mode, temporal scalability mode, SNRscalability mode, and non-scalability mode.

Spatial Scalability Mode

The first frame memory 102 and the second frame memory 104,respectively, perform read/write operations on the YCbCr data from theconversion circuit 101, in accordance with the R/W control signal (thecontrol signal based on external information 112 and specifying spatialscalability mode) provided by the control circuit 103 (specifically, CPU701).

The YCbCr data read out from the first frame memory 102 and the secondframe memory 104 are passed through the first data generation circuit105 and the second data generation circuit 106 and then provided to thefirst block forming processing circuit 107 and the second block formingprocessing circuit 108.

At this time, the first data generation circuit 105 and the second datageneration circuit 106 are both supplied with encoding control signals(control signals based upon the external information 112 and specifyingspatial scalability mode). Both first data generation circuit 105 andsecond data generation circuit 106 perform their operations inaccordance with those control signals.

In the first data generation circuit 105 (refer to FIG. 6), the firstselector 301 switches its output to the sampling circuit 304 accordingto the encoding control signal received from the control circuit 103,and then the YCbCr data is output from the first frame memory 102.

The sampling circuit 304 generates the base layer image data bycompressing the YCbCr image data-received from the first selector 301,in accordance with the sub-sampling size information included in theencoding control signal from the control circuit 103. The base layerimage data generated by the sampling circuit 304 is then supplied to thethird selector 302.

The third selector 302 then switches its output to the output (the baselayer image data) of the sampling circuit 304, according to the encodingcontrol signal from the control circuit 103. Therefore, the base layerimage data is then supplied to the first block forming processingcircuit 107. The base layer image data is also supplied to the seconddata generation circuit 106, explained later.

The base layer image data, supplied to the first block formingprocessing circuit 107 from the first data generation circuit 105, isdivided into blocks by block forming processing circuit 107. Then thepredetermined encoding processing is performed on the base image data inblock unit basis by the first encoding circuit 109, and the encoded datais supplied to the bit stream generation circuit 111.

In the second data generation circuit 106 (refer to FIG. 7), the firstselector 401 switches its output over to the first difference datageneration circuit 403, in accordance with the encoding control signalfrom the control circuit 103, to output the YCbCr data received from thesecond frame memory 104.

At the same time, frame memory 405 supplies the base layer image datafrom the first data generation circuit 105 to the first difference datageneration circuit 403, in accordance with the encoding control signalfrom the control circuit 103.

The first difference data generation circuit 403 and up-samples the baselayer image data from the frame memory 405 in frame or field basisaccording to the encoding control signal from the control circuit 103,to get the same size as the original image (or an image of theenhancement layer), thereby generating the image difference data betweenthe image data of the enhancement layer and the up-sampled image data.

The image difference data generated by the first difference datageneration circuit 403 is then supplied to the second selector 402.

The second selector 402 switches its output to the output (the imagedifference data) of the first difference data generation circuit 403according to the encoding control signal from the control circuit 103.Thus the image difference data is supplied to the second block formingprocessing circuit 108.

The image difference data of the enhancement layer, which is supplied inthis way to the second block forming processing circuit 108 from thesecond data generation circuit 106, is divided into blocks by the secondblock forming processing circuit 108. The divided data, independent ofthe base layer image data, then undergoes predetermined encodingprocessing in block units by the second encoding circuit 110. Thisresult is then supplied to the bit stream generation circuit 111.

The bit stream generation circuit 111 then attaches a suitable headercorresponding to a predetermined application (transmit, store), to thebase layer image data supplied by the first encoding circuit 109 and theenhancement layer image data (image difference data) supplied by thesecond encoding circuit 110 to be combined into one bit stream to form abit stream of scalable image data, and outputs the formed bit streamexternally.

Temporal Scalability Mode

In temporal scaling initially, in a way similar to spatial scalabilitymode described above, the YCbCr data read out from the first framememory 102 and the second frame memory 104 is also passed the throughfirst data generation circuit 105 and the second data generation circuit106, and then provided to the first block forming processing circuit 107and the second block forming processing circuit 108. However, theoperation of the first data generation circuit 105 and the second datageneration circuit 106 is different than that in operating in thespatial scalability mode described above.

That is, in the first data generation circuit 105 (refer to FIG. 6), thefirst selector 301 switches its output to the second selector 303,according to the encoding control signal from the control circuit 103 (acontrol signal specifying the temporal scalability mode based on theexternal information 112), to output the YCbCr data from the first framememory 102.

The second selector 303 then supplies the YCbCr data, received from thefirst selector 301, to the frame rate controller 305, in accordance withthe encoding control signal from control circuit 103.

The frame rate controller 305 generates the base layer image data byperforming on frame basis a down-sampling (reducing image dataresolution in the time basis) on the YCbCr data from the second selector303, in accordance with the frame rate information contained in theencoding control signal from control circuit 103.

The base layer image data generated by the frame rate controller 305 isthen supplied to the third selector 302.

The third selector 302 then switches over its output to the output (thebase layer image data) of the frame rate controller 305, according tothe encoding control signal from the control circuit 103. Therefore thebase layer image data is then supplied to the first block formingprocessing circuit 107. The base layer image data is also supplied tothe second data generation circuit 106, explained later.

The base layer image data, supplied in this way to the first blockforming processing circuit 107 from the first data generation circuit105, is divided into blocks by the block forming processing circuit 107.Then the predetermined encoding processing is performed on the divideddata in block units by the first encoding circuit 109, and the encodeddata is supplied to the bit stream generation circuit 111.

In the second data generation circuit 106 (refer to FIG. 7), the firstselector 401 switches over its output to the second difference datageneration circuit 404, according to the encoding control signal fromthe control circuit 103, and then outputs the YCbCr data received fromthe second frame memory 104.

At the same time, the frame memory 405 supplies the base layer imagedata from the first data generation circuit 105 to the second differencedata generation circuit 404, in accordance with the encoding controlsignal from control circuit 103.

The second difference data generation circuit 404 generates the imagedifference data as the encoded enhancement layer by referring to thebase layer image data from the frame memory 405, in accordance with theencoding control signal from the control circuit 103, as predictioninformation of the enhancement layer, as to image data backward andforward in the time basis.

The image difference data generated by the second difference datageneration circuit 404 is then supplied to the second selector 402.

The second selector 402 switches its output to the output (the imagedifference data) of the second difference data generation circuit 404,in accordance with the encoding control signal from the control circuit103. Thus the image difference data is supplied to the second blockforming processing circuit 108.

The enhancement layer image difference data, which is thus supplied tothe second block forming processing circuit 108 from the second datageneration circuit 106, is divided into blocks by the second blockforming processing circuit 108. The divided data, independent of thebase layer image data, then undergoes the encoding processing in blockunits by the second encoding circuit 110. This result is then suppliedto the bit stream generation circuit 111.

The bit stream generation circuit 111, as in the spatial scalabilitymode described above, then attaches a suitable header to the base layerimage data supplied by the first encoding circuit 109 and theenhancement layer image data (image difference data) supplied by thesecond encoding circuit 110, to form a bit stream of a scalable imagedata and output the formed-bit stream externally.

SNR Scalability Mode

The first frame memory 102 and the second frame memory 104,respectively, perform read/write operations of the YCbCr data fromconversion circuit 101, in accordance with the R/W control signals (thecontrol signals specifying SNR scalability mode based on externalinformation 112) provided by the control circuit 103 (specifically, CPU701).

In this case, the YCbCr data read out from the first frame memory 102and the second frame memory 104 is supplied directly to the first blockforming processing circuit 107 and the second block forming processingcircuit 108.

Next the YCbCr data is divided into blocks by the first block formingprocessing circuit 107 and the second block forming processing circuit108, then supplied to the first encoding circuit 109 and the secondencoding circuit 110.

In accordance with the encoding control signal from control circuit 103,the first encoding circuit 109 generates encoded base layer image databy performing the predetermined encoding processing, in block units, onthe YCbCr data supplied by the first block forming processing circuit107. The encoding processing is performed so as to attain predeterminedcode amount (compression ratio) based on the encoding control signal.

The encoded base layer image data from the first encoding circuit 109 issupplied to the bit stream generator 111 and also supplied to the secondencoding circuit 110 as a reference in an encoding processing of theenhancement layer image data.

The second encoding circuit 110 generates the image difference data asthe encoded enhancement layer by referring to the base layer image datafrom first encoding circuit 109, in accordance with the encoding controlsignal from control circuit 103, as prediction information of theenhancement layer as to both past and future image data.

The encoded enhancement layer (image difference data) obtained by thesecond encoding circuit 110 is then supplied to the bit streamgeneration circuit 111.

In a manner similar to spatial scalability and temporal scalability,described above, the bit stream generation circuit 111 attaches a headerto the base layer image data from the first encoding circuit 109 and theenhancement layer image data (image difference data) from the secondencoding circuit 110, to generate a bit stream of scalable image dataand output the generated bit stream externally.

Non-Scalability Mode

The first frame memory 102 and the second frame memory 104,respectively, perform read/write-operations of the YCbCr data from theconversion circuit 101, in accordance with the R/W control signals (thecontrol signals specifying non-scalability mode based on externalinformation 112) provided by the control circuit 103 (specifically, CPU701).

In this case, the YCbCr data read out from the first frame memory 102and the second frame memory 104 is supplied directly to the first blockforming processing circuit 107 and the second block forming processingcircuit 108.

The YCbCr data is then divided into blocks by the first block formingprocessing circuit 107 and the second block forming processing circuit108, and undergoes the predetermined encoding processing in block unitsin the first encoding circuit 109 and the second encoding circuit 110.The encoded data is then supplied to the bit stream generation circuit111.

The bit stream generation circuit 111 then attaches a suitable headercorresponding to a predetermined application (transmit, store) to therespective data supplied by both the first encoding circuit 109 and thesecond encoding circuit 110, to form a bit stream of the image data andoutput the formed bit stream externally.

An explanation of the decoding device follows. The decoding device isused to decode the encoded data generated by the encoding device,described above. FIG. 8 shows the block diagram of a decoding device 200to which the present invention is applied.

The decoding device 200 corresponds to the encoding device 100 of thefirst embodiment of the present invention.

In other words, the decoding device 200 performs the reverse processingof the encoding device 100. In particular, user information (provided bya user), described below, can be input into the decoding device 200.This user information includes various information such as image qualityand capabilities of the decoding device 200, for example.

Therefore, users of the decoding device 200 may input variousinformation related to the decoding, which causes a control circuit 208to gene-rate an external output information 212 based on the user input.This external output information is supplied to the encoding device 100as the external information 112, explained above.

A detailed explanation of setting user information follows, but such thedecoding processing can be taken as the exact opposite of the encodingprocessing, the explanation of the decoding processing is omitted here.In addition, explanations of the following FIGS. 9 to 11 are omittedbecause circuit shown in those figures operates in a manner exactlyopposite to corresponding circuits in the encoding device 100. FIG. 9shows the internal structure of the control circuit 208, FIG. 10 showsthe internal structure of a first data decoding circuit 209 in thedecoding device 200, and FIG. 11 shows the internal structure of asecond data decoding circuit 210 in the decoding device 200.

The input method of the user information is explained next.

FIG. 12 shows the structure of a system 240 that has the functions ofthe decoding device 200 of FIG. 8.

As FIG. 12 shows, the system 240 comprises a monitor 241, a personalcomputer (PC) body 242, and a mouse 243, which are connected to eachother.

The PC 242 contains the functions of the decoding device 200 shown inFIG. 8.

First, genre selection menu screen for selectable software (movingpicture) is displayed on the monitor 241 in the system 240. For example,“movie”, “music”, “photo”, as well as “etc.” are displayed on the menuscreen.

The user operates the mouse 243 and specifies the desired software genrefrom those displayed on the monitor screen. For example, specifically, amouse cursor 244 is lined up with the desired software genre (“movie” inFIG. 12), and the mouse 243 is clicked or double clicked. This operationdesignates the “movie” genre.

After this operation is finished, a menu screen such as that shown inFIG. 13 is displayed. This menu screen lists individual titlescorresponding to the genre set (“movie”) at the genre selection menu ofFIG. 12. For example, the title menu displayed lists “title-A”,“title-B”, “title-C”, and “title-D”, corresponding to the individual“movies”.

The user operates the mouse 243 and designates the desired title fromthose displayed on the screen. Specifically, for example, the mousecursor 244 is lined up with the desired title (“title-A” in FIG. 13),and the mouse 243 is clicked or double clicked. This operationdesignates the “title-A” title.

After this operation is finished, a condition setting screen such asthat shown in FIG. 14 is displayed. This condition setting screen is forsetting various conditions of decoding the data of “title-A” designatedat the title selection menu of FIG. 13. In the present embodiment, thefollowing conditions may be set:

-   S/N: designate one of low image quality (Low), high image quality    (High), and optimal image quality based upon the system's decoding    capabilities (Auto),-   Frame Rate: designate one of low frame rate (Low), high frame rate    (Full), and an optimal frame rate based upon the system's decoding    capabilities (Auto),-   Full Spec: designate highest image quality (high encoding volume)    for the encoder (the encoding device 100 of FIG. 4), and-   Full Auto: set various optimal conditions based upon the system's    decoding capabilities.

Therefore, as shown in FIG. 14, the user moves the mouse cursor 244 toline up with the desired condition to be set (“S/N” in FIG. 14), andclicks or double clicks the mouse 243. This causes a detailed S/Ncondition menu to be displayed, as shown in FIG. 15. The “Low”, “High”,and “Auto” are displayed as the conditions to be set.

The user then moves the mouse cursor 244 to line up with the desired S/Nsetting (“Auto” in FIG. 15), and clicks or double clicks the mouse 243.This selects the “Auto” setting for “S/N”, meaning that the system 240will automatically set the optimal image quality based on its decodingcapabilities.

The information about each of the conditions set on the screen describedabove is supplied as external information 112 to the encoding device100, described above in the first embodiment of the present invention.

As described above, the encoding device 100 receiving the information112, interprets the external information 112, selects the optimalscalability, determines the settings for each condition required for theoptimal scalability (image size, compression ratio, etc.), performsencoding processing, and outputs the result to the system 240 (decodingdevice 200) of FIG. 12.

FIG. 16 shows a block diagram of the structure of a decoder in a secondembodiment according to the present invention.

In FIG. 16, a variable length code decoder 1101 performs variable lengthcode decoding on a coded image information that is input, and an inversequantizer 1102 performs inverse quantizing on the decoded data outputfrom the variable length code decoder 1101. An inverse DCT unit 1103performs inverse DCT processing on the inverse quantized data outputfrom the inverse quantizer 1102.

A selector 1104, a selector 1109, and a selector 1110 switch the inputdata under control by a decoder control unit 1112. An average valuecalculation unit 1105 calculates average values between data stored in amemory #1 (1107) and a memory #2 (1108). An adder 1106 performs additionoperations on the inverse DCT data output from the inverse DCT unit 1103and the data output from the selector 1104.

The memories 1107 and 1108 that act as a data buffer for a decodedsignal, store the data output from the selector 1109. An output buffer1111 stores the sub-sampling data output from a sub-sampling unit 1113.The decoder control unit 1112 controls the sub-sampling unit 1113, aswell as the selectors 1104, 1109, and 1110. The sub-sampling unit 1113performs sub-sampling operations on the decoded image data stored in theoutput buffer 1111.

The decoding system, which includes the decoder of FIG. 16, will now beexplained with reference to FIG. 17.

FIG. 17 shows a block diagram of the structure of a decoding system inthe second embodiment according to the present invention.

A hierarchy separation unit 1201 shown in FIG. 17 interprets the headerinformation on the bit stream, which also includes the encoded imageinformation, and then separates each frame (picture) into hierarchies(objects). A header decoder 1202 decodes the separated headerinformation from the hierarchy separation unit 1201 and interpretsdecoded header to provide control information to a decoder group 1203comprising the decoder shown in FIG. 16. The decoder group 1203 decodesthe encoded image information that has been separated into object unitsby the hierarchy separation unit 1201.

A CG construction unit 1205 receives encoded CG information toreconstruct face animation and other CG images. The CG construction unit1205 possesses the function of constructing CG image by texture mappingor polygon processing with a software processing. An objectsynthesization unit 1204 constructs a single picture (frame) bysynthesizing each decoded object.

The encoding system corresponding to the decoding system will beexplained with reference to FIG. 18.

FIG. 18 shows a block diagram of the structure of the encoding system inthe second embodiment according to the present invention.

A VOP defining unit 1301 is shown in FIG. 18. The VOP (Video ObjectPlane) defining unit 1301 separates a digital image in units of a singlepicture (frame, or field) into (cuts out) a plurality of objects. Anencoder group 1302 performs independent encoding of each objectseparated by the VOP defining unit 1301.

A multiplexer 1303 gathers each of the encoded objects from the encodergroup 1302 into a single bit stream. A CG encoder 1304 encodes the CGimage mesh information (location, shape).

The each decoder (object units) that make up the decoder group 1203 ofthe decoding system shown in FIG. 17, includes a decoder shown in FIG.17 except for the CG construction unit 1205, and each decoder has thesame specifications. The CG construction unit 1205 is basicallyconstructed of software to generate the CG images, and a texture imagelibrary of the component parts that make up the images.

FIGS. 16 and 17 will next be used to explain the operation of thedecoder system.

As FIG. 17 shows, the input bit stream is separated into encoded imageinformation, header information, and CG encoded information by thehierarchy separation unit 1201. The encoded image information is inputto the decoder group 1203, the header information is input to the headerdecoder 1202, and the encoded CG information is input to the CGconstruction unit 1205. Each is then decoded. The header informationdecoded by the header decoder 1202, used as control information for thevarious functions of the decoders, is input into the decoder group 1203.In addition, when encoded CG information is input into the CGconstruction unit 1205, a CG image (face animation, etc.) is constructedby calculating the texture shapes in accordance with the inputinformation to arrange the calculated shapes on a mesh, etc.

An explanation of the processing that takes place in the decoder group1203 is given below, with reference to FIG. 16.

Encoded image information is input into the variable length code decoder1101, and control information (header information) is input to thedecoder control unit 1112. The decoder control unit 1112 generates acontrol signal for controlling various functions of the decoder, usingthe control information (header information) and information as to spaceareas of output buffer 1111, to control the selectors 1104, 1109 and1110 and the sub-sampling method used—in the sub-sampling unit 1113.

The encoded image information is processed as follows. Variable lengthcodes are decoded by the variable length code decoder 1101, inversequantization processing is performed on the decoded codes by the inversequantizer 1102, and then inverse DCT processing is done by the inverseDCT unit 1103. If the header information input to the decoder controlunit 1112 shows that the decoding mode for the image data currentlybeing processed is “intra”, the decoder control unit 1112 sets theselector 1104 to IV, leaves the selector 1109 in the present state, andsets the selector 1110 to either (b) or (c). In this case, with theselector 1104 set to IV (numerically zero), the inverse DCT processedimage data is stored in the memory #1 (1107) or the memory #2 (1108) asthey are.

On the other hand, if the header information shows that the decodingmode for the image data currently being processed is “inter (forwardprediction)”, the decoder control unit 1112 sets the selector 1104 toeither I or III, sets the selector 1109 to either (2) or (1) (if theselector 1104 is set to I, then sets to (2), if it is set to III, thenset s to (1)), and sets the selector 1110 to either (b) or (c) ((b) forif the selector 1109 is set to (1), (c) for if it is set to (2)). Thenthe decoded reference image data, stored in either the memory #1 (1107)or the memory #2 (1108), is read out in accordance with the motionvector, and added to the inverse DCT processed image data by the adder1106. This completes the decoding of the image data.

The completely decoded image data is then stored in the memory #2 (1108)(the selector 1109 set to (2)) if the reference image data used fordecoding is read out from the memory #1 (1107) (selector 1104 set to I).If, however, the reference image data is read out from the memory #2(1108), the decoded image data is stored in the memory #1 (1107)(selector 1109 set to (1)). At the same time, the decoded image data isoutput to the sub-sampling unit 1113 and the output buffer 1111, via theselector 1110 (contact point (c) or (b)).

Further, if the header information shows that the decoding mode for theimage data currently being processed is “inter (bi-directionalprediction)”, the decoder control unit 1112 sets the selector 1104 toII, sets the selector 1110 to (a), and leaves the selector 1109 in thepresent state. Then the decoded reference image data, stored in eitherof the memory #1 (1107) and the memory #2 (1108), is read out inaccordance with the motion vector, and the average of the two read-outdata is calculated by the average value calculation unit 1105. Thisaverage is output from the selector 1104 (contact point II), and addedto the inverse DCT processed image data by the adder 1106, therebycompleting the image data decoding. Then, the decoded data output to thesub-sampling unit 1113 and the output buffer 1111, via the selector 1110(contact point (a)). Note that image data decoded by bi-directionalprediction is not used by any further decoding processes, and istherefore not stored in either memory #1 (1107) or memory #2 (1108).

The above sequential processing stores the decoded image data in theoutput buffer 1111, which can then be read out to a CRT or other displaydevice at the rate it requires.

The amount of decoded image data will generally change with time, andthe available space in the buffer 1111 will change in tandem with thatamount of the decoded image data. The decoder control unit 1112regularly monitors the available space in the output buffer 1111, and ifthe decoder control unit 112 determines that an overflow may occur, itinstructs the sub-sampling unit 1113 to perform optional sub-sampling onthe decoded image data, thereby avoiding overflow of the output buffer1111.

In addition, the decoder control unit 1112 also monitors headerinformation of image data to be decoded. If the amount of encoded imageinformation increases rapidly, it determines that the amount of imagedata stored in the output buffer 1111 may rapidly rise, and once againinstructs that optional sub-sampling on the decoded image data beperformed.

Sub-sampling is explained next, with reference to FIGS. 19A and 19B.

FIG. 19A shows the image to be thinned out, and FIG. 19B shows thethinned-out image. The thinning-out process removes every other pixel oneach horizontal line of the image by reversing a thinning-out phaseevery other line, thereby reducing the number of the horizontal pixelsby one-half (reduces the horizontal resolution by one-half).

A post filter (interference removal filter) is disposed after thesub-sampling unit 1113 to eliminate interference caused by spatialfrequencies upon sub-sampling. The sub-sampling processing to thin outthe decoded image data and avoid overflow in the output buffer 1111takes place in object units in each of the decoders of decoder group1203 in FIG. 17. In addition, since sub-sampling is performed in objectunits, if the decoder control unit 1112 determines that the factor ofoverflow in the output buffer 1111 has been eliminated, it is programmedto stop sub-sampling with a predetermined delay after such thedetermination.

Next, the process flow that occurs inside the decoder in the secondembodiment is explained with reference to FIG. 20.

FIG. 20 shows a flowchart of the processing that takes place in thedecoder of the second embodiment according to the present invention.

First, the bit stream is input in a step S101. In a step S102, the inputbit stream is then separated into header information, encoded imageinformation, and encoded CG information. The encoded image informationis then decoded according to the decoding mode designated by the headerinformation. A step S103 checks whether or not there is a possibilitythat the output buffer 1111, which will store the decoded image data, isabout to overflow. If there is the possibility of overflow (YES in stepS103), the processing proceeds to a step S104, where sub-sampling of thedecoded image data takes place. If there is no possibility of overflow(NO in step S103), then the processing is finished.

As explained above, with the second embodiment of the present invention,if the output buffer 1111 appears to be in an overflow condition duringthe decoding processing of the input bit stream, sub-sampling isinstantly performed until the amount of decoded image data stored in theoutput buffer 1111 is reduced. The temporary sacrifice in spatialresolution of the decoded image is used to avoid an interruption indecoding processing or an accompanying mix-up in decoded images.

Next, decoding processing where the bit stream employs scalability isdescribed as a third embodiment of the present invention.

FIG. 21 shows a block diagram of the structure of a decoder of the thirdembodiment according to the present invention.

In FIG. 21, a decoding unit 1701 has the same construction as that shownin FIG. 16, although the sub-sampling unit 1113 is not necessary. Acontrol unit 1702 controls each component of the decoders. A selector1703 and a selector 1708 both perform switching functions on the inputdata.

A spatial scalability enhancement layer generation unit 1704 generatesthe enhancement layer image during spatial scalability operation. Atemporal scalability enhancement layer generation unit 1705 performs asimilar function during temporal scalability operation by generating theenhancement layer image. A base layer generation unit 1706 generates thebase layer image for both spatial scalability and temporal scalabilityoperation. A resolution selector 1707 switches the input data. Finally,a selection signal 1709 is the input signal provided by the user.

The decoding system employed in the third embodiment of the presentinvention has the same specifications as the decoder group 1203 shown inFIG. 17, employing of the decoder explained in FIG. 21. In addition, thefunctions of each decoder and the CG construction unit 1205 have beenrealized by the combination of an arithmetic unit (hardware) andsoftware (program) that satisfies all of the functions shown in FIG. 21.

Next, operation of the decoding system of the third embodiment of thepresent invention is described, with reference to FIGS. 17 and 21.

As FIG. 17 shows, the input bit stream is separated into encoded imageinformation and header information by the hierarchy separation unit1201. The encoded image information is input to the decoder group 1203,while the header information is sent to the header decoder 1202, andeach is then decoded. The header information decoded by the headerdecoder 1202 is then input to the decoder group 1203 as controlinformation for each of the functions of the decoder group 1203.

The various processes that occur in the decoder group 1203 will beexplained below with reference to FIG. 21.

As FIG. 21 shows, the encoded image data that has been separated by thehierarchy separation unit 1201 is input to the decoding unit 1701, andthe control information (header information), decoded by the headerdecoder 1201, is input to the control unit 1702.

The input control information (header information) is first interpretedby the control unit 1702, and the control specifications needed fordecoding, such as encoding mode and information related to scalability,are input to the decoding unit 1701. In addition to the function forinterpreting the control information (header information), the controlunit 1702 has the function for monitoring both the processes that occurin the decoding unit 1701, and memory. Thereby, an operation state ofthe decoding unit 1701 is taken into consideration as the controlinformation.

The encoded image information undergoes decoding processing by thedecoding unit 1701, such as variable length decoding, inversequantizing, and inverse DCT processing, in accordance with the controlinformation (header information) from the control unit 1702. The resultof the decoding processing is then sent to the selector 1703.

If the bit stream input into the decoding unit has been encoded by usingscalability, information about the scalability used is generallytransmitted as the header information. Therefore the control informationgenerated by the control unit 1702 is sent to the decoding unit 1701,the selector 1703, as well as the resolution selector 1707. Both thebase layer image and the enhancement layer image are reconstructedaccording to spatial or temporal scalability.

High resolution is basically the default selection for the reconstructedimage. However, there are two cases wherein the control unit 1702 andthe CG construction unit 1205 determine that the decoding process hasfailed. One of such the two cases is that it is determined as result ofthe interpretation of the bit stream header information by the controlunit 1702 that the capabilities of the decoding unit 1701 do not allowfor normal processing. The other case is that the CG construction unit1205 determines that the encoded CG information input to the CGconstruction unit 1205 exceeds its processing capabilities, or ananother request for processing of encoded CG information is receivedduring the processing of encoded CG information by the CG constructionunit 1205. In these two cases, processing of the enhancement layer (highresolution information) is halted regardless of the selection signal1709, and only the base layer is decoded to be output from the selector1708.

In addition, in case of that the control unit 1702 detects or predicts afailure (real-time decoding inability or input/output buffer overflow)of the bit stream decoding processing, caused by a rapid increase offrequency of appearance of intra-frames or intra-macro blocks,processing of the enhancement layer (high resolution information) ishalted regardless of the selection signal 1709 and only the base layerimage is decoded to be output from the selector 1708.

Also, in case of that the CG construction unit 1205 generates to thecontrol unit 1202 a flag indicating that it is unable to continueprocessing when the amount of encoded CG information rapidly increasesand then the load on the CG construction unit 1205 by software alsorapidly increases to exceed capabilities of CG construction unit 1205,processing of the enhancement layer (high resolution information) ishalted regardless of the selection signal 1709 and only the base layeris decoded to be output from the selector 1708.

By reserve power in the decoder group 1203 that is brought about as aresult of halting enhancement layer decoding, that is, by usingprocessing capability of an arithmetic apparatus for processing ofencoded CG information, construction of the CG image is completednormally. It is programmed that the control unit 1702 returns to anormal operation after the N-frame (or field) delay time is elapsed fromtime when the control unit 1702 interprets header information wheninput, and it determines that normal processing of the bit stream ispossible.

As explained above, in case of that when many encoded CG information isinput by a bit stream using scalability, and then the load on thedecoder group 1203 rapidly increases, the control unit 1702 determinesthat the continuation of normal decoding processing is impossible, anoperation is set to a fixed mode wherein the enhancement layer image ofeach object is halted and only the base layer (low resolution) image isoutput. According to this structure of the present invention, load of adecoding operation except for decoding operation of the encoded CGinformation can be reduced, and the reserve computing power can beapportioned to the CG construction unit 1205, and thereby normaldecoding operations can be maintained without visible interruption (withno image freezes or no loss of objects).

The third embodiment of the present invention is constructed so that theselection signal 1709 from outside of the system is received to controlthe selector 1708. Therefore a user has the option of inputting theselection signal 1709 from the outside. If the bit stream (encoded imageinformation) uses the spatial scalability, then either high or lowspatial resolution may be selected by the selection signal 1709, and ifthe bit stream uses the temporal scalability, then either high or lowtemporal resolution (frame rate, etc.) may be selected by the selectionsignal 1709.

Next, the process flow that occurs in the decoder of the thirdembodiment of the present invention will be discussed with reference toFIG. 22.

FIG. 22 is a flowchart showing the processing that takes place in thedecoder of the third embodiment according to the present invention.

First, the bit stream is input in a step S201. Then the enhancementlayer and base layer images are reconstructed from in the input bitstream in a step S202. A step S203 determines whether or not there ispossibility that the decoding processing may fail. If there is apossibility of failure (YES in step S203), then processing proceeds to astep S204, which decodes only the base layer image. If there is nopossibility of failure (NO in step S203), then processing proceeds to astep S205, which decodes both the base layer and enhancement layerimages.

As explained above for the third embodiment of the present invention, incase of that the bit stream employs scalability, potential bufferoverflows and failures in the decoding process (decoding cannot keep upwith the rate of input) are detected to immediately halt the enhancementlayer and switch to the decoding processing of the base layer image.According to this structure, with the temporary sacrifice in temporal orspatial resolution, interruption in decoding processing or anaccompanying mix-up in decoded images, and freezes, etc. can be avoided.

In addition, with a predetermined delay time from stop of an abnormaloperation with sub-sampling or forced decoding (low resolution) of onlythe base layer image to return to a normal processing, it can be avoidedthat the slight variations in decoded image data amount results inrepeated changing from normal processing to abnormal processing and backagain.

The present invention may be applied to a system constructed of severalmachines (for example, a host computer, interface unit, reader, printer,etc.), and it may also be applied to a device (for example, a copier,facsimile, etc.) consisting of just one machine.

In addition, it is obvious that an object of the present invention canbe realized by supplying a storage medium, in which software code thatcan execute the above described functions is stored, to a system or adevice to make the system or equipment computer (or CPU or MPU) read inthe stored program and then execute the program.

In this case, the program code read out from the storage medium realizesthe functions of the embodiments of the present invention describedabove, and therefore the storage medium itself constitutes the presentinvention.

Storage media such as floppy disks, hard disks, optical disks,magneto-optical disks, CD-ROMs, CD-Rs, magnetic tapes, non-volatilememory cards, ROMs, etc. may be used to supply the program code.

Also, it is obvious that it constitutes the present invention that inaddition to that a computer executes the read-out program code torealize the functions of the embodiments of the present inventiondescribed above, operating system (OS), etc., which runs in thecomputer, performs either a portion of or the entire of the processingto realize the functions in the embodiments described above.

In addition, after the program code has been read out from the storagemedium and written to a memory in an expansion board inserted into thecomputer or expansion unit connected to the computer, the CPU etc.arranged on the expansion board or in the expansion unit may performeither a portion of or the entire amount of the processing to realizethe functions in the embodiments described above. This also constitutesthe present invention.

The foregoing description of embodiments has been given for illustrativepurposes only, and is not to be construed as imposing any limitations inany respect. The scope of the invention is, therefore, to be determinedsolely by the following claims and their legal equivalents, and is notlimited by the text of the specifications. Alterations made within ascope equivalent to the scope of the claims fall within the true spiritand scope of the invention.

1. An image processing apparatus comprising: a) an inputting unit,arranged to input encoded image data; b) a decoding unit, arranged todecode the encoded image data; c) a controller, arranged to control thedecoding process of said decoding unit according to the processingcapabilities of said decoding unit, wherein the encoded image dataincludes image data of a plurality of objects, encoded on an objectbasis and said plurality of objects include an image of ahierarchically-encoded object; d) buffer memory means, arranged tobuffer image data; and e) detection means, arranged to detect a capacityof said buffer memory means, wherein said controller controls whichhierarchy layer of the image data of the hierarchically-encoded objectis to be decoded, in accordance with a detection result of saiddetection means.
 2. An apparatus according to claim 1, wherein saidcontroller determines whether or not the decoding process of the encodedimage data exceeds the processing capabilities of said decoding unit,and when determined that the decoding process exceeds the processingcapabilities, said controller controls said decoding unit to decode onlya portion of the layers of the encoded image data.
 3. An apparatusaccording to claim 1, further comprising an instructing unit, arrangedto instruct a layer that is to be decoded by said decoding unit, whereinsaid controller controls the decoding process of said decoding unit inaccordance with an output of said instructing unit.
 4. An imageprocessing method comprising the steps of: inputting encoded image data;decoding the encoded image data; controlling the decoding process ofsaid decoding step according to the encoded image data and decodingprocessing capabilities of said decoding step, wherein the encoded imagedata includes image data of a plurality of objects encoded on an objectbasis and said plurality of objects include an image of ahierarchically-encoded object; buffering image data with buffer memorymeans; and detecting a capacity of said buffer memory means, whereinsaid controlling step controls which hierarchy layer of the image dataof the hierarchically-encoded object is to be decoded in accordance witha detection result in said detecting step.
 5. A storage medium storing aprogram for executing a decoding process, the process comprising thesteps of: inputting encoded image data; decoding the encoded image data;controlling the decoding process of said decoding step according to theencoded image data and decoding processing capabilities of said decodingstep, wherein the encoded image data includes image data of a pluralityof objects, encoded on an object basis and said plurality of objectsinclude an image of a hierarchically-encoded object; buffering imagedata with buffer memory means; and detecting a capacity of said buffermemory means, wherein said controlling step controls which hierarchylayer of the image data of the hierarchically-encoded object is to bedecoded in accordance with a detection result in said detecting step.